{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# VGG\n",
    "AlexNet 作者做了很多实验，只是说明深度学习可以取得很牛的结果，但是没有提供网络的设计准则。\n",
    "本节介绍VGG，它的名字来源于论文作者所在的实验室Visual Geometry Group [1]。\n",
    "\n",
    "##  VGG块\n",
    "\n",
    "VGG块的组成规律是：连续使用数个相同的填充为1、窗口形状为$3\\times 3$的卷积层后接上一个步幅为2、窗口形状为$2\\times 2$的最大池化层。卷积层保持输入的高和宽不变，而池化层则对其减半。我们使用`vgg_block`函数来实现这个基础的VGG块，它可以指定卷积层的数量和输入输出通道数\n",
    "\n",
    "> 对于给定的感受野（与输出有关的输入图片的局部大小），采用堆积的小卷积核优于采用大的卷积核，因为可以增加网络深度来保证学习更复杂的模式，而且代价还比较小（参数更少）。例如，在VGG中，使用了3个3x3卷积核来代替7x7卷积核，使用了2个3x3卷积核来代替5*5卷积核，这样做的主要目的是在保证具有相同感知野的条件下，提升了网络的深度，在一定程度上提升了神经网络的效果。\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import time\n",
    "import torch\n",
    "from torch import nn, optim\n",
    "\n",
    "import sys\n",
    "sys.path.append(\"..\") \n",
    "torch.backends.cudnn.benchmark = True\n",
    "torch.backends.cudnn.deterministic = False\n",
    "torch.backends.cudnn.enabled = True\n",
    "\n",
    "import dl_utils\n",
    "\n",
    "device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')\n",
    "\n",
    "def vgg_block(num_convs, in_channels, out_channels):\n",
    "    blk = []\n",
    "    for i in range(num_convs):\n",
    "        if i == 0:\n",
    "            blk.append(nn.Conv2d(in_channels, out_channels, kernel_size=3, padding=1))\n",
    "        else:\n",
    "            blk.append(nn.Conv2d(out_channels, out_channels, kernel_size=3, padding=1))\n",
    "        blk.append(nn.ReLU())\n",
    "    blk.append(nn.MaxPool2d(kernel_size=2, stride=2)) # 这里会使宽高减半\n",
    "    return nn.Sequential(*blk)\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "现在我们构造一个VGG网络。它有5个卷积块，前2块使用单卷积层，而后3块使用双卷积层。第一块的输入输出通道分别是1（因为下面要使用的Fashion-MNIST数据的通道数为1）和64，之后每次对输出通道数翻倍，直到变为512。因为这个网络使用了8个卷积层和3个全连接层，所以经常被称为VGG-11。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "conv_arch = ((1, 1, 64), (1, 64, 128), (2, 128, 256), (2, 256, 512), (2, 512, 512))\n",
    "# 经过5个vgg_block, 宽高会减半5次, 变成 224/32 = 7\n",
    "fc_features = 512 * 7 * 7 # c * w * h\n",
    "fc_hidden_units = 4096 # 任意"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "def vgg(conv_arch, fc_features, fc_hidden_units=4096):\n",
    "    net = nn.Sequential()\n",
    "    # 卷积层部分\n",
    "    for i, (num_convs, in_channels, out_channels) in enumerate(conv_arch):\n",
    "        # 每经过一个vgg_block都会使宽高减半\n",
    "        net.add_module(\"vgg_block_\" + str(i+1), vgg_block(num_convs, in_channels, out_channels))\n",
    "    # 全连接层部分\n",
    "    net.add_module(\"fc\", nn.Sequential(dl_utils.FlattenLayer(),\n",
    "                                 nn.Linear(fc_features, fc_hidden_units),\n",
    "                                 nn.ReLU(),\n",
    "                                 nn.Dropout(0.5),\n",
    "                                 nn.Linear(fc_hidden_units, fc_hidden_units),\n",
    "                                 nn.ReLU(),\n",
    "                                 nn.Dropout(0.5),\n",
    "                                 nn.Linear(fc_hidden_units, 10)\n",
    "                                ))\n",
    "    return net"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "vgg_block_1 output shape:  torch.Size([1, 64, 112, 112])\n",
      "vgg_block_2 output shape:  torch.Size([1, 128, 56, 56])\n",
      "vgg_block_3 output shape:  torch.Size([1, 256, 28, 28])\n",
      "vgg_block_4 output shape:  torch.Size([1, 512, 14, 14])\n",
      "vgg_block_5 output shape:  torch.Size([1, 512, 7, 7])\n",
      "fc output shape:  torch.Size([1, 10])\n"
     ]
    }
   ],
   "source": [
    "net = vgg(conv_arch, fc_features, fc_hidden_units)\n",
    "X = torch.rand(1, 1, 224, 224)\n",
    "\n",
    "# named_children获取一级子模块及其名字(named_modules会返回所有子模块,包括子模块的子模块)\n",
    "for name, blk in net.named_children(): \n",
    "    X = blk(X)\n",
    "    print(name, 'output shape: ', X.shape)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "可以看到，每次我们将输入的高和宽减半，直到最终高和宽变成7后传入全连接层。与此同时，输出通道数每次翻倍，直到变成512。因为每个卷积层的窗口大小一样，所以每层的模型参数尺寸和计算复杂度与输入高、输入宽、输入通道数和输出通道数的乘积成正比。VGG这种高和宽减半以及通道翻倍的设计使得多数卷积层都有相同的模型参数尺寸和计算复杂度。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 获取数据和训练模型\n",
    "\n",
    "因为VGG-11计算上比AlexNet更加复杂，出于测试的目的我们构造一个通道数更小，或者说更窄的网络在Fashion-MNIST数据集上进行训练。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Sequential(\n",
      "  (vgg_block_1): Sequential(\n",
      "    (0): Conv2d(1, 8, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))\n",
      "    (1): ReLU()\n",
      "    (2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)\n",
      "  )\n",
      "  (vgg_block_2): Sequential(\n",
      "    (0): Conv2d(8, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))\n",
      "    (1): ReLU()\n",
      "    (2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)\n",
      "  )\n",
      "  (vgg_block_3): Sequential(\n",
      "    (0): Conv2d(16, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))\n",
      "    (1): ReLU()\n",
      "    (2): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))\n",
      "    (3): ReLU()\n",
      "    (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)\n",
      "  )\n",
      "  (vgg_block_4): Sequential(\n",
      "    (0): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))\n",
      "    (1): ReLU()\n",
      "    (2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))\n",
      "    (3): ReLU()\n",
      "    (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)\n",
      "  )\n",
      "  (vgg_block_5): Sequential(\n",
      "    (0): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))\n",
      "    (1): ReLU()\n",
      "    (2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))\n",
      "    (3): ReLU()\n",
      "    (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)\n",
      "  )\n",
      "  (fc): Sequential(\n",
      "    (0): FlattenLayer()\n",
      "    (1): Linear(in_features=3136, out_features=512, bias=True)\n",
      "    (2): ReLU()\n",
      "    (3): Dropout(p=0.5)\n",
      "    (4): Linear(in_features=512, out_features=512, bias=True)\n",
      "    (5): ReLU()\n",
      "    (6): Dropout(p=0.5)\n",
      "    (7): Linear(in_features=512, out_features=10, bias=True)\n",
      "  )\n",
      ")\n"
     ]
    }
   ],
   "source": [
    "ratio = 8\n",
    "small_conv_arch = [(1, 1, 64//ratio), (1, 64//ratio, 128//ratio), (2, 128//ratio, 256//ratio), \n",
    "                   (2, 256//ratio, 512//ratio), (2, 512//ratio, 512//ratio)]\n",
    "net = vgg(small_conv_arch, fc_features // ratio, fc_hidden_units // ratio)\n",
    "print(net)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "training on  cuda\n",
      "epoch 1, loss 0.5819, train acc 0.781, test acc 0.878, time 64.7 sec\n",
      "epoch 2, loss 0.1622, train acc 0.882, test acc 0.891, time 62.0 sec\n",
      "epoch 3, loss 0.0911, train acc 0.901, test acc 0.911, time 61.2 sec\n",
      "epoch 4, loss 0.0611, train acc 0.913, test acc 0.905, time 62.8 sec\n",
      "epoch 5, loss 0.0434, train acc 0.921, test acc 0.916, time 62.3 sec\n"
     ]
    }
   ],
   "source": [
    "batch_size = 64\n",
    "# 如出现“out of memory”的报错信息，可减小batch_size或resize\n",
    "train_iter, test_iter = dl_utils.load_data_fashion_mnist(batch_size, resize=224)\n",
    "\n",
    "lr, num_epochs = 0.001, 5\n",
    "optimizer = torch.optim.Adam(net.parameters(), lr=lr)\n",
    "dl_utils.train_ch5(net, train_iter, test_iter, batch_size, optimizer, device, num_epochs)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 未用CUDNN 63s，使用 cudnn.\n",
    "torch.backends.cudnn.benchmark = True\n",
    "torch.backends.cudnn.deterministic = False\n",
    "torch.backends.cudnn.enabled = True"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.8"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
